{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 场景二：多媒体资源的摘要"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "pulling manifest\n",
      "pulling 8359bebea988... 100% |██████████████████| (7.4/7.4 GB, 61 TB/s)        \n",
      "pulling 65c6ec5c6ff0... 100% |████████████████████| (45/45 B, 955 kB/s)        \n",
      "pulling dd36891f03a0... 100% |████████████████████| (31/31 B, 814 kB/s)        \n",
      "pulling f94f529485e6... 100% |███████████████████| (382/382 B, 18 MB/s)        \n",
      "verifying sha256 digest\n",
      "writing manifest\n",
      "removing any unused layers\n",
      "success\n"
     ]
    }
   ],
   "source": [
    "!ollama pull llama2-chinese:13b"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%pip install langchain langchain-core langchain-community\n",
    "%pip install arxiv pymupdf"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{'Published': '2023-03-10', 'Title': 'ReAct: Synergizing Reasoning and Acting in Language Models', 'Authors': 'Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan, Yuan Cao', 'Summary': 'While large language models (LLMs) have demonstrated impressive capabilities\\nacross tasks in language understanding and interactive decision making, their\\nabilities for reasoning (e.g. chain-of-thought prompting) and acting (e.g.\\naction plan generation) have primarily been studied as separate topics. In this\\npaper, we explore the use of LLMs to generate both reasoning traces and\\ntask-specific actions in an interleaved manner, allowing for greater synergy\\nbetween the two: reasoning traces help the model induce, track, and update\\naction plans as well as handle exceptions, while actions allow it to interface\\nwith external sources, such as knowledge bases or environments, to gather\\nadditional information. We apply our approach, named ReAct, to a diverse set of\\nlanguage and decision making tasks and demonstrate its effectiveness over\\nstate-of-the-art baselines, as well as improved human interpretability and\\ntrustworthiness over methods without reasoning or acting components.\\nConcretely, on question answering (HotpotQA) and fact verification (Fever),\\nReAct overcomes issues of hallucination and error propagation prevalent in\\nchain-of-thought reasoning by interacting with a simple Wikipedia API, and\\ngenerates human-like task-solving trajectories that are more interpretable than\\nbaselines without reasoning traces. On two interactive decision making\\nbenchmarks (ALFWorld and WebShop), ReAct outperforms imitation and\\nreinforcement learning methods by an absolute success rate of 34% and 10%\\nrespectively, while being prompted with only one or two in-context examples.\\nProject site with code: https://react-lm.github.io'}\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "'\\n这篇论文在ICLR 2023上发表，研究了如何兼顾理解和行动的能力。目前的大型语言模型(LLMs)已经成功地应用于许多语言理解和交互式决策任务，但它们的理解和行为两个方面主要是另外两个研究主题。我们将LLMs用于在逻辑追踪和任务特定的动作之间进行更加合作的使用，以此来产生更好的结果。我们命名为ReAct，并应用它到了多种语言和决策任务中，并取得了超越比较性好几个基线的表现。'"
      ]
     },
     "execution_count": 21,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from langchain_core.prompts import PromptTemplate, format_document\n",
    "from langchain_core.output_parsers import StrOutputParser\n",
    "from langchain_community.chat_models import ChatOllama\n",
    "from langchain_community.document_loaders import ArxivLoader\n",
    "from langchain.text_splitter import RecursiveCharacterTextSplitter\n",
    "\n",
    "# 加载 arXiv 上的论文《ReAct: Synergizing Reasoning and Acting in Language Models》\n",
    "loader = ArxivLoader(query=\"2210.03629\", load_max_docs=1)\n",
    "docs = loader.load()\n",
    "print(docs[0].metadata)\n",
    "\n",
    "# 把文本分割成 500 字一组的切片\n",
    "text_splitter = RecursiveCharacterTextSplitter(\n",
    "    chunk_size = 500,\n",
    "    chunk_overlap = 0\n",
    ")\n",
    "chunks = text_splitter.split_documents(docs)\n",
    "\n",
    "# 构建 Stuff 形态（即文本直接拼合）的总结链\n",
    "doc_prompt = PromptTemplate.from_template(\"{page_content}\")\n",
    "chain = (\n",
    "    {\n",
    "        \"content\": lambda docs: \"\\n\\n\".join(\n",
    "            format_document(doc, doc_prompt) for doc in docs\n",
    "        )\n",
    "    }\n",
    "    | PromptTemplate.from_template(\"用中文总结以下内容，不需要人物介绍，字数控制在 50 字以内：\\n\\n{content}\")\n",
    "    | ChatOllama(model=\"llama2-chinese:13b\")\n",
    "    | StrOutputParser()\n",
    ")\n",
    "# 由于论文很长，我们只选取前 2000 字作为输入并调用总结链\n",
    "chain.invoke(chunks[:4])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 38,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'This paper introduces the REACT model which leverages both reasoning and acting abilities to improve the performance of large language models, achieving state-of-the-art results in various benchmarks. The authors present a novel approach that combines logic-based reasoning with behavior-based actions in LLMs, which enables them to better handle tasks such as question answering and text generation. Additionally, ReAct is an algorithm that combines chain-of-thought reasoning with simple Wikipedia API and generates human-like task-solving trajectories. It can outperform imitation and reinforcement learning methods in two interactive decision making benchmarks, achieving absolute success rates of 34% and 10%. Their proposed method is evaluated on several datasets and is shown to significantly outperform other baseline models, demonstrating its potential for improving the capabilities of language models. By using reasoning traces to help induce, track, and update action plans as well as handle unexpected events during execution, this approach has the potential to improve the overall performance of LLMs. The ReAct model combines both reasoning and acting abilities of large language models to achieve state-of-the-art results in various benchmarks. It also demonstrates its potential for improving the capabilities of language models by using reasoning traces to help induce, track, and update action plans as well as handle unexpected events during execution.'"
      ]
     },
     "execution_count": 38,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from functools import partial\n",
    "\n",
    "from langchain_community.chat_models import ChatOllama\n",
    "from langchain_core.prompts import PromptTemplate, format_document\n",
    "from langchain_core.output_parsers import StrOutputParser\n",
    "from langchain_community.document_loaders import ArxivLoader\n",
    "from langchain.text_splitter import RecursiveCharacterTextSplitter\n",
    "\n",
    "# 加载 arXiv 上的论文《ReAct: Synergizing Reasoning and Acting in Language Models》\n",
    "loader = ArxivLoader(query=\"2210.03629\", load_max_docs=1)\n",
    "docs = loader.load()\n",
    "\n",
    "# 把文本分割成 500 字一组的切片\n",
    "text_splitter = RecursiveCharacterTextSplitter(\n",
    "    chunk_size = 500,\n",
    "    chunk_overlap = 50\n",
    ")\n",
    "chunks = text_splitter.split_documents(docs)\n",
    "\n",
    "llm = ChatOllama(model=\"llama2-chinese:13b\")\n",
    "\n",
    "# 构建工具函数：将 Document 转换成字符串\n",
    "document_prompt = PromptTemplate.from_template(\"{page_content}\")\n",
    "partial_format_document = partial(format_document, prompt=document_prompt)\n",
    "\n",
    "# 构建 Map 链：对每个文档都先进行一轮总结\n",
    "map_chain = (\n",
    "    {\"context\": partial_format_document}\n",
    "    | PromptTemplate.from_template(\"Summarize this content:\\n\\n{context}\")\n",
    "    | llm\n",
    "    | StrOutputParser()\n",
    ")\n",
    "\n",
    "# 构建 Reduce 链：合并之前的所有总结内容\n",
    "reduce_chain = (\n",
    "    {\"context\": lambda strs: \"\\n\\n\".join(strs) }\n",
    "    | PromptTemplate.from_template(\"Combine these summaries:\\n\\n{context}\")\n",
    "    | llm\n",
    "    | StrOutputParser()\n",
    ")\n",
    "\n",
    "# 把两个链合并成 MapReduce 链\n",
    "map_reduce = map_chain.map() | reduce_chain\n",
    "map_reduce.invoke(chunks[:4], config={\"max_concurrency\": 5})\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 41,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "\"In this paper, we propose a novel approach called REACT that integrates reasoning traces and acting capabilities within a single framework to improve the overall performance of large language models (LLMs). By interleaving reasoning and acting, we can synergize the two cognitive abilities and enhance the capabilities of LLMs in various applications such as natural language processing, human-computer interaction, and cognitive systems.\\n\\nOur approach addresses issues of hallucination and error propagation in chain-of-thought reasoning by interacting with a simple Wikipedia API and generating human-like task-solving trajectories that are more interpretable than baselines without reasoning traces. Furthermore, on two interactive decision making benchmarks (ALFWorld and WebShop), ReAct outperforms imitation and reinforcement learning methods by an absolute success rate of 34% and 10% respectively, while being prompted with natural language commands.\\n\\nREACT's potential for improving the capabilities of LLMs in various applications is significant, especially when it comes to dealing with open-ended questions or conversation scenarios where reasoning traces are essential to handling ambiguity and uncertainty. By leveraging the strengths of reasoning and acting together with synergistic integration, REACT has the potential to revolutionize various applications of language and cognitive systems.\\n\\nIn addition, we explore different components of REACT to provide insights into how they contribute to its overall performance. We also suggest potential avenues for future research to further enhance the capabilities of LLMs. Our proposed model, REACT, has the potential to overcome existing limitations and provide improved human interpretability and trustworthiness in various applications.\\n\""
      ]
     },
     "execution_count": 41,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from functools import partial\n",
    "from operator import itemgetter\n",
    "\n",
    "from langchain_community.chat_models import ChatOllama\n",
    "from langchain_core.prompts import PromptTemplate, format_document\n",
    "from langchain_core.output_parsers import StrOutputParser\n",
    "from langchain_community.document_loaders import ArxivLoader\n",
    "from langchain.text_splitter import RecursiveCharacterTextSplitter\n",
    "\n",
    "# 加载 arXiv 上的论文《ReAct: Synergizing Reasoning and Acting in Language Models》\n",
    "loader = ArxivLoader(query=\"2210.03629\", load_max_docs=1)\n",
    "docs = loader.load()\n",
    "\n",
    "# 把文本分割成 500 字一组的切片\n",
    "text_splitter = RecursiveCharacterTextSplitter(\n",
    "    chunk_size = 500,\n",
    "    chunk_overlap = 50\n",
    ")\n",
    "chunks = text_splitter.split_documents(docs)\n",
    "\n",
    "llm = ChatOllama(model=\"llama2-chinese:13b\")\n",
    "\n",
    "# 构建工具函数：将 Document 转换成字符串\n",
    "document_prompt = PromptTemplate.from_template(\"{page_content}\")\n",
    "partial_format_document = partial(format_document, prompt=document_prompt)\n",
    "\n",
    "# 构建 Context 链：总结第一个文档并作为后续总结的上下文\n",
    "first_prompt = PromptTemplate.from_template(\"Summarize this content:\\n\\n{context}\")\n",
    "context_chain = {\"context\": partial_format_document} | first_prompt | llm | StrOutputParser()\n",
    "\n",
    "# 构建 Refine 链：基于上下文（上一次的总结）和当前内容进一步总结\n",
    "refine_prompt = PromptTemplate.from_template(\n",
    "    \"Here's your first summary: {prev_response}. \"\n",
    "    \"Now add to it based on the following context: {context}\"\n",
    ")\n",
    "refine_chain = (\n",
    "    {\n",
    "        \"prev_response\": itemgetter(\"prev_response\"),\n",
    "        \"context\": lambda x: partial_format_document(x[\"doc\"]),\n",
    "    }\n",
    "    | refine_prompt\n",
    "    | llm\n",
    "    | StrOutputParser()\n",
    ")\n",
    "\n",
    "# 构建一个负责执行 Refine 循环的函数 \n",
    "def refine_loop(docs):\n",
    "    summary = context_chain.invoke(docs[0])\n",
    "    for i, doc in enumerate(docs[1:]):\n",
    "        summary = refine_chain.invoke({\"prev_response\": summary, \"doc\": doc})\n",
    "    return summary\n",
    "  \n",
    "refine_loop(chunks[:4])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "['你好LangChai', 'gChain实战']\n",
      "['你好', 'LangChain', '实战']\n"
     ]
    }
   ],
   "source": [
    "from langchain.text_splitter import RecursiveCharacterTextSplitter\n",
    "\n",
    "text_splitter = RecursiveCharacterTextSplitter(chunk_size=10, chunk_overlap=5)\n",
    "print(text_splitter.split_text(\"你好LangChain实战\"))\n",
    "print(text_splitter.split_text(\"你好 LangChain 实战\"))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "AIMessage(content='12')"
      ]
     },
     "execution_count": 29,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from operator import itemgetter\n",
    "\n",
    "from langchain_core.runnables import RunnableLambda\n",
    "from langchain_core.prompts import ChatPromptTemplate\n",
    "from langchain_community.chat_models import ChatOllama\n",
    "\n",
    "# 单个参数的函数可以直接被 RunnableLambda 封装\n",
    "def length_function(text):\n",
    "    return len(text)\n",
    "\n",
    "# 多个个参数的函数需要先被封装成单个参数的函数，再传给 RunnableLambda\n",
    "def _multiple_length_function(text1, text2):\n",
    "    return len(text1) * len(text2)\n",
    "\n",
    "def multiple_length_function(_dict):\n",
    "    return _multiple_length_function(_dict[\"text1\"], _dict[\"text2\"])\n",
    "\n",
    "\n",
    "prompt = ChatPromptTemplate.from_template(\"what is {a} + {b}\")\n",
    "model = ChatOllama(model=\"llama2-chinese:13b\")\n",
    "\n",
    "chain1 = prompt | model\n",
    "\n",
    "chain = (\n",
    "    {\n",
    "        \"a\": itemgetter(\"foo\") | RunnableLambda(length_function),\n",
    "        \"b\": {\"text1\": itemgetter(\"foo\"), \"text2\": itemgetter(\"bar\")}\n",
    "        | RunnableLambda(multiple_length_function),\n",
    "    }\n",
    "    | prompt\n",
    "    | model\n",
    ")\n",
    "chain.invoke({\"foo\": \"bar\", \"bar\": \"gah\"})"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'joke': AIMessage(content='有一只小白兔，他很喜欢吃花草。有天，他吃了太多的草丛了嘴子，家人看到他就开始笑起来说：“你好像被草踩住了！”小白兔半逼、半害羡地说：“是的，我已经吃完了。”'),\n",
       " 'poem': AIMessage(content='嫩潤的小白兔\\n在草地上玩耍。\\n黑脸和白身，\\n像小羊的朋友。\\n')}"
      ]
     },
     "execution_count": 31,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from langchain_community.chat_models import ChatOllama\n",
    "from langchain_core.prompts import ChatPromptTemplate\n",
    "from langchain_core.runnables import RunnableParallel\n",
    "\n",
    "model = ChatOllama(model=\"llama2-chinese:13b\")\n",
    "\n",
    "joke_chain = ChatPromptTemplate.from_template(\"讲一句关于{topic}的笑话\") | model\n",
    "poem_chain = ChatPromptTemplate.from_template(\"写一首关于{topic}的短诗\") | model\n",
    "\n",
    "# 通过 RunnableParallel（也可以叫做 RunnableMap）来并行执行两个调用链\n",
    "map_chain = RunnableParallel(joke=joke_chain, poem=poem_chain)\n",
    "map_chain.invoke({\"topic\": \"小白兔\"})"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
